MACS 30100
University of Chicago
Reduce the sum of the squared error
\[\sum_{j=1}^J \sum_{i \in R_j} (y_i - \hat{y}_{R_j})^2\]
Continue until stopping point reached
Classification error rate
\[E = 1 - \max_{k}(\hat{p}_{mk})\]
Gini index
\[G = \sum_{k = 1}^k \hat{p}_{mk} (1 - \hat{p}_{mk})\]
Cross-entropy
\[D = - \sum_{k = 1}^K \hat{p}_{mk} \log(\hat{p}_{mk})\]
| Outcome | Number of training observations |
|---|---|
| Died | 344 |
| Survived | 72 |
| Less than 24.75 years old | Outcome | Number of training observations |
|---|---|---|
| FALSE | Died | 232 |
| FALSE | Survived | 60 |
| TRUE | Died | 112 |
| TRUE | Survived | 12 |
Linear functional form
\[f(X) = \beta_0 + \sum_{j = 1}^p X_j \beta_j\]
Decision tree functional form
\[f(X) = \sum_{m = 1}^M c_m \cdot 1_{X \in R_m}\]
Bootstrap aggregating
\[\hat{f}_{\text{avg}}(x) = \frac{1}{B} \sum_{b = 1}^B \hat{f}^b(x)\]
\[\hat{f}_{\text{bag}}(x) = \frac{1}{B} \sum_{b = 1}^B \hat{f}^b(x)\]
##
## Call:
## randomForest(formula = Survived ~ ., data = titanic_rf_data, mtry = 7, ntree = 500)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 7
##
## OOB estimate of error rate: 20.6%
## Confusion matrix:
## Died Survived class.error
## Died 357 67 0.158
## Survived 80 210 0.276
## user system elapsed
## 0.291 0.005 0.297
## user system elapsed
## 3.570 0.133 3.759
##
## Call:
## randomForest(formula = Survived ~ ., data = titanic_rf_data, mtry = 7, ntree = 500)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 7
##
## OOB estimate of error rate: 20.6%
## Confusion matrix:
## Died Survived class.error
## Died 357 67 0.158
## Survived 80 210 0.276
##
## Call:
## randomForest(formula = Survived ~ ., data = titanic_rf_data, ntree = 500)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 2
##
## OOB estimate of error rate: 18.4%
## Confusion matrix:
## Died Survived class.error
## Died 381 43 0.101
## Survived 88 202 0.303
| Variable used to split | Number of training observations |
|---|---|
| Female | 500 |
| Variable used to split | Number of training observations |
|---|---|
| Age | 39 |
| Embarked | 61 |
| Fare | 113 |
| Female | 120 |
| Parch | 31 |
| Pclass | 127 |
| SibSp | 9 |
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Using OOB method...
## Using OOB method...
## Using OOB method...
| Depth | Optimal number of iterations |
|---|---|
| 1 | 3449 |
| 2 | 3210 |
| 4 | 2360 |